Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #21708: NUMA-pin host memory buffers for D2H/H2D transfers #22243

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented Feb 3, 2025

PR #21708: NUMA-pin host memory buffers for D2H/H2D transfers

Imported from GitHub PR #21708

This ensures that the pinned host buffers used for transfers between host and device are pinned to the NUMA node closest to the device. It had a previous life as #15216.

In a benchmark that triggers large, concurrent, copies from all devices to the host then achieved D2H throughput is around 33 GiB/s with NUMA pinning on a DGX H100 node (2xCPU, 8xH100). Without pinning, the achieved throughput is around 13.5 GiB/s from the same benchmark.

While it is already possible to achieve the correct NUMA pinning in process-per-GPU and process-per-NUMA-node configurations using numactl or similar, achieving correct pinning in process-per-node configuration requires logic inside XLA.
Copybara import of the project:

--
0eab66c by Olli Lupton [email protected]:

NUMA-pin host memory buffers for D2H/H2D transfers

--
57a4664 by Olli Lupton [email protected]:

256 byte alignment for host allocations when NUMA is not enabled

--
ad2895a by Olli Lupton [email protected]:

Address review comments

--
629777e by Olli Lupton [email protected]:

std::string_view -> absl::string_view

--
21587a5 by Olli Lupton [email protected]:

Apply @beckerhe's suggested Bazel changes

--
175c5f6 by Olli Lupton [email protected]:

add missing dependency

Merging this change closes #21708

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21708 from olupton:numa 175c5f6

@copybara-service copybara-service bot force-pushed the test_722688719 branch 18 times, most recently from d261874 to 6fc0a40 Compare February 5, 2025 15:00
Imported from GitHub PR #21708

This ensures that the pinned host buffers used for transfers between host and device are pinned to the NUMA node closest to the device. It had a previous life as #15216.

In a benchmark that triggers large, concurrent, copies from all devices to the host then achieved D2H throughput is around 33 GiB/s with NUMA pinning on a DGX H100 node (2xCPU, 8xH100). Without pinning, the achieved throughput is around 13.5 GiB/s from the same benchmark.

While it is already possible to achieve the correct NUMA pinning in process-per-GPU and process-per-NUMA-node configurations using `numactl` or similar, achieving correct pinning in process-per-node configuration requires logic inside XLA.
Copybara import of the project:

--
0eab66c by Olli Lupton <[email protected]>:

NUMA-pin host memory buffers for D2H/H2D transfers

--
57a4664 by Olli Lupton <[email protected]>:

256 byte alignment for host allocations when NUMA is not enabled

--
ad2895a by Olli Lupton <[email protected]>:

Address review comments

--
629777e by Olli Lupton <[email protected]>:

std::string_view -> absl::string_view

--
21587a5 by Olli Lupton <[email protected]>:

Apply @beckerhe's suggested Bazel changes

--
175c5f6 by Olli Lupton <[email protected]>:

add missing dependency

Merging this change closes #21708

FUTURE_COPYBARA_INTEGRATE_REVIEW=#21708 from olupton:numa 175c5f6
PiperOrigin-RevId: 722688719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant